92
Binary Neural Architecture Search
binary architecture search can be improved by the auxiliary binary architecture search [24].
BATS [24] designs a new search space specially tailored for the binary network and incor-
porates it into the DARTS framework.
Unlike the aforementioned methods, our work is driven by the performance discrepancy
between the 1-bit neural architecture and its real-valued counterpart. We introduce tangent
propagation to explore the accuracy discrepancy and further accelerate the search process
by applying the GGN to the Hessian matrix in optimization. Furthermore, we introduce a
novel decoupled optimization to address asynchronous convergence in such a differentiable
NAS process, leading to better performed 1-bit CNNs. The overall framework leads to a
novel and effective BNAS process.
To introduce the advances of the NAS area, we separately introduce the representative
works in the NAS and binary NAS in the following.
4.2
ABanditNAS: Anti-Bandit for Neural Architecture Search
Low search efficiency has prevented NAS from its practical use, and the introduction of
adversarial optimization and a larger search space further exacerbates the issue. Early work
directly regards network architecture search as a black-box optimization problem in a dis-
crete search space and takes thousands of GPU days. To reduce the search space, a common
idea is to adopt a cell-based search space [307]. However, when it comes to searching in a
huge and complicated search space, prior cell-based works may still suffer from memory is-
sues and are computationally intensive with the number of meta-architecture. For example,
DARTS [151] can only optimize over a small subset of 8 cells, which are then stacked to
form a deep network of 20. We reformulate NAS as a multi-armed bandit problem with a
vast search space to increase search efficiency. The multi-armed bandit algorithm targets
predicting the best arm in a sequence of trials to balance the result and its uncertainty.
Likewise, NAS aims to get the best operation from an operation pool at each edge of the
model with finite optimization steps, similar to the multi-armed bandit algorithm. They
are both exploration and exploitation problems. Therefore, we tried to introduce the multi-
armed bandit algorithm into NAS. In addition, the multi-armed bandit algorithm avoids
the gradient descent process and provides good search speed for NAS. Unlike traditional
Upper Confidence Bound (UCB) bandit algorithms that prefer to sample using UCB and
focus on exploration, we propose Anti-Bandit to further exploit both UCB and Lower Con-
fidence Bound (LCB) to balance exploration and exploitation. We achieve an accuracy-bias
trade-offduring the search process for the operation performance estimation. Using the test
performance to identify the optimal architecture quickly is desirable. With the help of the
Anti-Bandit algorithm, our Anti-Bandit NAS (ABanditNAS) [34] can handle the vast and
complicated search space, where the number of operations that define the space can be 960!
Specifically, our proposed Anti-Bandit algorithm uses UCB to reduce search space, and
LCB guarantees that every arm is thoroughly tested before abandoning it, as shown in
Figure 4.1. Based on the observation that the early optimal operation is not necessarily
the optimal one in the end, and the worst operations in the early stage usually have worse
performance in the end [291], we pruned the operations with the worst UCB, after enough
trials selected by the worst LCB. This means that the operations we finally reserve are
certainly a near-optimal solution. The more tests that are conducted, the closer UCB and
LCB are to the average value. Therefore, LCB tends to increase and UCB decreases with
increasing sampling times. Specifically, operations with poor performance in the early stages,
such as parameterized operations, will receive more opportunities but are abandoned once